440 research outputs found

    Recent advances in RNA sequence analysis

    Get PDF
    The latest high-throughput DNA sequencing technology can now be applied on a large scale to capture the complete set of mRNA transcripts in a cell, using a technique called RNA-seq. Although RNA-seq is only 2 years old, it has rapidly swept through the field of genomics, and it is now being used to analyze the transcriptomes of organisms ranging from bacteria to primates. The depth of sequencing allows researchers to quantify the level of expression of genes, to discover alternative isoforms in eukaryotic species, and even to characterize the operon structure of bacterial genomes

    Between a chicken and a grape: estimating the number of human genes

    Get PDF
    The number of genes in the human genome is still an estimate

    Genomic Features Of A Bumble Bee Symbiont Reflect Its Host Environment

    Get PDF
    Here, we report the genome of one gammaproteobacterial member of the gut microbiota, for which we propose the name >Candidatus Schmidhempelia bombi,> that was inadvertently sequenced alongside the genome of its host, the bumble bee, Bombus impatiens. This symbiont is a member of the recently described bacterial order Orbales, which has been collected from the guts of diverse insect species; however, >Ca. Schmidhempelia> has been identified exclusively with bumble bees. Metabolic reconstruction reveals that >Ca. Schmidhempelia> lacks many genes for a functioning NADH dehydrogenase I, all genes for the high-oxygen cytochrome o, and most genes in the tricarboxylic acid (TCA) cycle. >Ca. Schmidhempelia> has retained NADH dehydrogenase II, the low-oxygen specific cytochrome bd, anaerobic nitrate respiration, mixed-acid fermentation pathways, and citrate fermentation, which may be important for survival in low-oxygen or anaerobic environments found in the bee hindgut. Additionally, a type 6 secretion system, a Flp pilus, and many antibiotic/multidrug transporters suggest complex interactions with its host and other gut commensals or pathogens. This genome has signatures of reduction (2.0 megabase pairs) and rearrangement, as previously observed for genomes of host-associated bacteria. A survey of wild and laboratory B. impatiens revealed that >Ca. Schmidhempelia> is present in 90% of individuals and, therefore, may provide benefits to its host.Center for Insect Science (University of Arizona)National Science Foundation NSF 1046153NIH Director's Pioneer 1DP1OD006416-01NIH R01-HG006677Swiss National Science Foundation 140157, 147881Integrative Biolog

    Detection and correction of false segmental duplications caused by genome mis-assembly

    Get PDF
    A method for determining false segmental duplications in vertebrate genomes, thus correcting mis-assemblies and providing more accurate estimates of duplications

    An empirical analysis of training protocols for probabilistic gene finders

    Get PDF
    BACKGROUND: Generalized hidden Markov models (GHMMs) appear to be approaching acceptance as a de facto standard for state-of-the-art ab initio gene finding, as evidenced by the recent proliferation of GHMM implementations. While prevailing methods for modeling and parsing genes using GHMMs have been described in the literature, little attention has been paid as of yet to their proper training. The few hints available in the literature together with anecdotal observations suggest that most practitioners perform maximum likelihood parameter estimation only at the local submodel level, and then attend to the optimization of global parameter structure using some form of ad hoc manual tuning of individual parameters. RESULTS: We decided to investigate the utility of applying a more systematic optimization approach to the tuning of global parameter structure by implementing a global discriminative training procedure for our GHMM-based gene finder. Our results show that significant improvement in prediction accuracy can be achieved by this method. CONCLUSIONS: We conclude that training of GHMM-based gene finders is best performed using some form of discriminative training rather than simple maximum likelihood estimation at the submodel level, and that generalized gradient ascent methods are suitable for this task. We also conclude that partitioning of training data for the twin purposes of maximum likelihood initialization and gradient ascent optimization appears to be unnecessary, but that strict segregation of test data must be enforced during final gene finder evaluation to avoid artificially inflated accuracy measurements

    2009 Swine-Origin Influenza A (H1N1) Resembles Previous Influenza Isolates

    Get PDF
    Background: In April 2009, novel swine-origin influenza viruses (S-OIV) were identified in patients from Mexico and the United States. The viruses were genetically characterized as a novel influenza A (H1N1) strain originating in swine, and within a very short time the S-OIV strain spread across the globe via human-to-human contact. Methodology: We conducted a comprehensive computational search of all available sequences of the surface proteins of H1N1 swine influenza isolates and found that a similar strain to S-OIV appeared in Thailand in 2000. The earlier isolates caused infections in pigs but only one sequenced human case, A/Thailand/271/2005 (H1N1). Significance: Differences between the Thai cases and S-OIV may help shed light on the ability of the current outbreak strain to spread rapidly among humans

    Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake

    Get PDF
    BACKGROUND: In many prokaryotes, transcription of DNA to RNA is terminated by a thymine-rich stretch of DNA following a hairpin loop. Detecting such Rho-independent transcription terminators can shed light on the organization of bacterial genomes and can improve genome annotation. Previous computational methods to predict Rho-independent terminators have been slow or limited in the organisms they consider. RESULTS: We describe TransTermHP, a new computational method to rapidly and accurately detect Rho-independent transcription terminators. We predict the locations of terminators in 343 prokaryotic genomes, representing the largest collection of predictions available. In Bacillus subtilis, we can detect 93% of known terminators with a false positive rate of just 6%, comparable to the best-known methods. Outside the Firmicutes division, we find that Rho-independent termination plays a large role in the Neisseria and Vibrio genera, the Pasteurellaceae (including the Haemophilus genus) and several other species. In Neisseria and Pasteurellaceae, terminator hairpins are frequently formed by closely spaced, complementary instances of exogenous DNA uptake signal sequences. We quantify the propensity for terminators to include these sequences. In the process, we provide the first discussion of potential uptake signals in Haemophilus ducreyi and Mannheimia succiniciproducens, and we discuss the preference for a particular configuration of uptake signal sequences within terminators. CONCLUSION: Our new fast and accurate method for detecting transcription terminators has allowed us to identify and analyze terminators in many new genomes and to identify DNA uptake signal sequences in several species where they have not been previously reported. Our software and predictions are freely available

    A clustering method for repeat analysis in DNA sequences

    Get PDF
    BACKGROUND: A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data structure has been used previously for efficient computation of exact and degenerate repeats. RESULTS: The resulting software tool collects all repeat classes and outputs summary statistics as well as a file containing multiple sequences (multi fasta), that can be used as the target of searches. Its use is demonstrated here on several complete microbial genomes, the entire Arabidopsis thaliana genome, and a large collection of rice bacterial artificial chromosome end sequences. CONCLUSIONS: We propose a new clustering method for analysis of the repeat data captured in suffix trees. This method has been incorporated into a system that can find repeats in individual genome sequences or sets of sequences, and that can organize those repeats into classes. It quickly and accurately creates repeat databases from small and large genomes. The associated software (RepeatFinder), should prove helpful in the analysis of repeat structure for both complete and partial genome sequences

    16GT: A fast and sensitive variant caller using a 16-genotype probabilistic model

    Get PDF
    © The Author 2017. Published by Oxford University Press. 16GT is a variant caller for Illumina whole-genome and whole-exome sequencing data. It uses a new 16-genotype probabilistic model to unify single nucleotide polymorphism and insertion and deletion calling in a single variant calling algorithm. In benchmark comparisons with 5 other widely used variant callers on a modern 36-core server, 16GT demonstrated improved sensitivity in calling single nucleotide polymorphisms, and it provided comparable sensitivity and accuracy for calling insertions and deletions as compared to the GATK HaplotypeCaller. 16GT is available at https://github.com/aquaskyline/16GT.Link_to_subscribed_fulltex
    corecore